AFOSR  -TR-  T7*  068  8 


* 


SOME  CONSIDERATIONS  IN  THE  EVALUATION 
OF  ALTERNATE  PREDICTION  EQUATIONS 


Richard  F.  Gunst  and 

Department  of  Statistics 
Southern  Methodist  University 
Dallas,  Texas  75275 


Robert  L.  Mason 
Automotive  Research  Division 
Southwest  Research  Institute 
San  Antonio,  Texas  78284 


Prediction  equations  constructed  from  multiple  linear  regression 
analyses  are  often  intended  for  use  in  predicting  response  values 
throughout  a region  of  the  space  of  the  predictor  variables.  Criteria 
for  evaluating  prediction  equations,  however,  have  generally  con- 
centrated attention  on  mean  squared  error  properties  of  the  estimated 
regression  coefficients  or  on  mean  squared  error  properties  of  the 
predictor  at  the  design  points.  If  adequate  prediction  throughout 
a region  of  the  space  of  predictor  variables  is  the  goal,  neither  of 
these  criteria  may  be  satisfactory  in  assessing  the  predictor.  In 
this  paper  integrated  mean  squared  error  is  used  as  a criterion  to 
determine  when  the  least  squares,  principal  component,  and  ridge 
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regression  estimators  of  regression  coefficients  can  produce  satis- 
factory prediction  equations  in  the  presence  of  a multicollinear 


design  matrix. 
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1.  INTRODUCTION 

Box  and  Draper  [1]  espouse  the  use  of  an  integrated  mean  squared 

error  criterion  to  evaluate  experimental  designs  proposed  for  use  in 

fitting  response  surface  models.  Specifically,  Box  and  Draper  are 

» 

concerned  with  the  appropriate  selection  of  design  points  = 

(X^^,  •••  < X ) , i = 1,  2,  ...  , n,  so  that  the  mean  squared 

error  of  the  least  squares  prediction  equation,  integrated  over  an 
appropriate  region  of  interest  in  the  p design  variables,  is  suitably 
small.  Thus,  experimental  designs  can  be  evaluated  with  respect  to 
(i)  the  variance  of  the  fitted  model,  (ii)  bias  incurred  when  an 
incorrect  functional  form  is  assumed  between  the  response  variable 
and  the  design  variables,  (iii)  particular  regions  of  interest  of  the 
design  variables,  and  (iv)  weighting  functions  that  enable  some 
regions  of  the  predictor  variables  to  influence  the  integrated  mean 
squared  error  more  heavily  than  others.  The  flexibility  and  intuitive 
appeal  of  integrated  mean  squared  error  has  resulted  in  several  sub- 
sequent papers  evaluating  both  response  surface  designs  (e.g.  [2], 

[4],  [5],  [6],  [7])  and  estimators  of  the  response  function  (e.g. 

[3],  [10],  [13],  [14],  [15],  [16],  [21]). 

The  purpose  of  this  paper  is  to  show  that  integrated  mean  squared 
error  is  a valuable  tool  in  evaluating  prediction  equations  arising 
from  the  use  of  different  estimators  of  the  unknown  parameters  in 
multiple  linear  regression  models.  The  situation  discussed  in  this 
paper  differs  from  the  one  posed  in  most  of  the  above  articles  in 
that  we  assume  the  data  analyst  has  no  control  over  the  predictor 
(design)  variables;  i.e.,  the  experimenter  cannot  select  the  values 


k 
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of  the  predictor  variables  for  which  data  on  the  response  variable 
is  to  be  obtained.  This  type  of  data  is  often  characterized  by  the 
occurrence  of  multicollinearities  among  the  predictor  variables  and 
consequent  poor  prediction  when  least  squares  prediction  equations 
are  employed  (see,  for  example,  [11],  [19],  and  [20]).  For  these 
reasons,  biased  regression  estimators  have  become  very  popular  when 
regression  data  is  multicollinear . 

Hocking  [11]  and  Gunst  and  Mason  [8]  reference  much  of  the  lit- 
erature dealing  with  comparisons  of  estimators  of  regression  coef- 
ficients. Overwhelmingly,  these  articles  deal  with  comparisons  using 
the  mean  squared  errors  of  the  estimated  regression  coefficients, 
although  a few  authors  examine  pointwise  mean  squared  errors  of  the 
prediction  equations  at  the  design  points.  Yet  the  potential  advan- 
tages of  assessing  a prediction  equation  using  integrated  mean  squared 
error  are  many:  the  variances  of  the  prediction  equations  are  in- 

cluded in  the  assessment,  biases  due  to  the  use  of  biased  regression 
estimators  and  also  due  to  misspecifications  of  the  model  can  be 
evaluated,  regions  of  the  space  of  predictor  variables  for  which 
one  estimator  has  smaller  integrated  mean  squared  error  than  another 
can  be  identified,  and  unequal  weightings  can  be  assigned  to  regions 
of  the  space  of  predictor  variables  to  reflect  different  requirements 
for  accurate  prediction. 

Section  2 of  this  paper  models  the  problem  addressed  in  this 


v. 


paper  and  contrasts  it  with  the  one  considered  by  Box  and  Draper. 
One  important  distinction  noted  between  regression  analysis  and 
choosing  an  experimental  design  to  estimate  response  surface  models 
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is  that,  unlike  the  design  problem,  multicollinearities  among  the 
predictor  variables  in  a regression  analysis  frequently  cause  the 
variance  portion  of  the  integrated  mean  squared  error  to  he  much 
larger  than  the  bias  due  to  using  a biased  estimator  of  the  regression 
coefficients.  In  Sections  3,  4,  and,  5,  general  expressions  for 
integrated  variance,  integrated  squared  bias,  and  integrated  mean 
squared  error,  respectively,  are  presented,  along  with  specific 
results  for  models  with  two  predictor  variables.  Section  6 briefly 


discusses  estimation  of  integrated  mean  squared  error  and  presents 
a numerical  example.  Conclusions  and  recommendations  for  further 


research  are  given  in  Section  7. 


2.  THE  PROBLEM 


We  will  concentrate  attention  in  this  article  to  multiple  linear 
regression  models  of  the  following  form: 


Y=B0l  + XJ3+£, 


where  Y is  an  (n  x 1)  vector  of  response  variables,  is  an  unknown 

constant,  1 is  an  (n  x i)  vector  of  ones,  X = [X,,  X . ...  , X ] is 
— — 1 ~2  -p 

an  (n  x p)  full  column  rank  matrix  of  known  nonstochastic  predictor 

variables,  B is  a (p  x l)  vector  of  unknown  regression  coefficients, 

2 

and  c is  an  (n  x i)  vector  of  random  error  terms  with  c_  - N ( j),  a I). 
Except  for  the  example  discussed  later  in  this  section,  we  assume 
that  model  (2.1)  has  been  correctly  specified  by  the  experimenter 


and  that  the  columns  of  X have  been  standardized  so  that  X!  1 = 0 

~n  ~ 

and  X 1 Xj  = 1 f or  j = 1 » 2 , • • • t P . 
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Consider  a prediction  equation  of  the  form 
Y(u)  = SQ  + u'8, 


(2.2) 


where  u'  = (u  , u , ...  , u ) is  a vector  of  standardized  (as  in 

“12  p 

(2.1))  values  of  the  p predictor  variables  at  which  a predicted 
value  of  the  response  variable  is  desired,  and  8Q  and  8 are  estimators 


of  the  unknown  constants  in  (2.1).  If  X.  1 = 0,  i.e.  the  predictor 
variables  are  centered,  then  we  will  use  8Q  = Y. 

If  the  prediction  equation  (2.2)  is  to  be  used  for  a range  of 
values  of  the  predictor  variables,  some  measure  of  the  adequacy  of 
prediction  throughout  this  region  of  the  predictor  variable  space  is 
needed  to  assess  its  efficacy.  One  such  measure  is  integrated  mean 
squared  error,  J,  defined  as 


J = /.../  E{Y (u)  - E [Y  (u) ] } 2 W(u) du.  (2.3) 

R 

As  defined  in  (2.3),  integrated  mean  squared  error  incorporates  the 
mean  squared  error  of  the  prediction  equation  at  the  point  u,  i.e. 
e{y (u)  - E[Y(u)]}  , weighted  by  an  appropriate  function  W(u)  and 
integrated  over  a region  R = R(u) . This  definition  of  integrated 
mean  squared  error  can  be  adapted  to  discrete  weight  functions  and 
models  in  which  some  predictor  variables  are  functionally  related  to 
one  another;  however,  we  will  restrict  our  attention  to  continuous 
predictor  variables  for  simplicity  (Helms  [9]  treats  some  of  the 
complications  of  the  more  general  definitions  of  J) . 

Box  and  Draper  [1]  analyzed  in  some  detail  the  choice  of  an 


experimental  design  for  fitting  a quadratic  response  surface 


Y = 8, 


Vi + 


sux? + e 


(2.4) 


when  it  was  incorrectly  assumed  that  the  response  surface  was  linear, 
i .e . 


Y = 8q  + 81X1  + e.  (2.5) 

We  wish  to  discuss  this  example  to  point  out  the  differences  that 

occur  when  one  can  choose  the  design  points  and  then  estimate  the 

regression  coefficients  versus  the  problems  that  arise  when  one  cannot 

do  so.  Box  and  Draper  assumed  that  the  design  points  could  be  centered 
n 

so  that  £ X = 0 and  that  the  region  of  interest  R could  be  chosen 
i=l 

(through  a scaling  of  X^  to  be  -1  < X < 1.  They  also  chose  the 

weight  function  to  be  constant  throughout  R;  in  particular,  they  let 

W (u)  = no  2(/11  du^)  ] = no  2/2.  Using  least  squares  estimators  of 

8q  and  8^  it  is  then  easily  verified  that  (note  we  have  not  assumed 
n 2 

that  £ X.  = 1 in  this  example) 
i=l 

J = V + B = (1+  l/3c) 

+ a^tc2  ~ 2c/3  + 1/5  + d2/3c2},  (2.6) 

where  the  first  term  on  the  r.h.s.  of  (2.6)  is  the  integrated  vari- 
ance (V)  of  the  prediction  equation,  while  the  second  term  is  the 

2 -22  -1  n 2 

integrated  squared  bias  (B) , with  a = no  8.  , c = n £ X 

J.  «L  J.  • _ 1 -A-  t 

-1  n 3 1=1 

and  d = n £ X . 

i=l  11 

In  choosing  the  design  points  to  minimize  (2.6),  Box  and  Draper 
noted  that,  regardless  of  the  value  of  c,  J would  be  smallest  when 
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d = 0.  In  selecting  the  value  of  c to  minimize  J,  however,  one  must 

2 

specify  a value  for  a^.  Alternatively,  if  V and  B are  restricted, 

2 

unique  values  of  c and  can  be  found  to  minimize  J.  For  example, 

if  one  considers  a situation  in  which  the  variance  and  bias  terms 

2 

in  (2.6)  are  equal,  one  can  solve  for  the  values  of  c and  a that 
minimize  J subject  to  the  restriction  V = B. 


Box  and  Draper  investigated  this  further  by  solving  for  the 

minimizing  values  of  c for  four  cases:  V = °°,  B = 0 (c2  = »)  ; 

V = 4B  ( c **  = 0.72)  ; V = B (c*5  = 0.62)  ; and  V = 0,  B = » (c5  = 0.58)  . 

% 

By  noting  the  similarity  of  the  values  of  c in  the  last  three  cases 
versus  c = ® for  the  first  one,  they  reached  the  rather  surprising 


conclusion  that  when  the  true  model  is  quadratic  but  one  assumes  a 


linear  one,  designs  that  incorporate  contributions  from  both  variance 
and  bias  in  the  minimization  of  integrated  mean  squared  error  are 


very  similar  to  those  that  ignore  variance  completely  and  minimize 
the  integrated  squared  bias. 

Through  other  examples  Box  and  Draper  and  subsequent  authors 
showed  that  this  same  conclusion  (i.e.  optimal  designs  that  incor- 
porate both  integrated  variance  and  integrated  squared  bias  when 
minimizing  J are  close  to  the  all  bias  designs)  is  true  in  a variety 
of  response  surface  situations.  The  major  distinction  between  these 
examples  and  a regression  analysis  is  the  inability  of  the  experi- 


menter in  the  latter  instance  to  select  the  design  points.  In  par- 
ticular the  data  analyst  performing  a regression  analysis  generally 
cannot  guarantee,  as  did  Box  and  Draper,  that  the  columns  of  X 
are  mutually  orthogonal  or  that  odd  sample  moments  of  the  X^  are  zero. 


The  effects  of  the  nonorthogonality  of  the  columns  of  X on  the  con- 
clusions of  Box  and  Draper  can  be  illustrated  by  a simple  extension 
of  the  above  example. 

Suppose  that,  instead  of  considering  the  one  variable  model 
(2.4),  the  true  model  relates  the  response  variable  to  two  predictor 
variables  as  follows: 


1 - 60  * B1X1  * B2X2  + B11X?  + B22X2  + £- 


(2.7) 


Again,  unknowingly,  suppose  the  experimenter  assumes  the  model  is 
linear  in  the  two  predictor  variables,  i.e.  he  assumes 


Y = B, 


B1X1  + 


B2X2  + e. 


(2.8) 


In  order  to  make  the  comparison  as  similar  as  possible  to  the  one 
variable  example  considered  above,  also  assume  that 


EXil  = 


Z xi2  = I 


and 


X2  X.„  = E 
ll  i2 


2 3 

X.  _X7„  = E XT, 
ll  i2  ll 


= E 


x3, 

i2 


= 0 


(2.9) 


-12  -12 
n A E XT  = n E X7_  = 
ll  i2 


with  all  the  summations  taken  from  i = 1 to  i = n.  Finally,  we 
do  not  assume  n * E XX  = 0,  but  that  n 1 E X.  X = c r . 

1 J.  Xb  XX  JL  C.  Xb 

With  these  assumptions  , and  letting  R = {(X^,  X2)  : -1  < X,  < 1 
_2 

i = 1,2}  and  W ( u)  = no  /4 , 

J = V + B = (1+  2/ 3c (1  - r22)) 

+ (oJx  + «22){(C  " 1/3)2  + 4/45}  + 20tlia22(C  “ 1/3)2' 

«2 

where  a . . = no  6..,  j = 1,2.  Further  simplifying  the  problem  by 
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letting  ct^  = a 22  = a yields 

J = V + B = (1  + 2/3c  (1  - r212)) 

+ a2{4(c  - 1/3) 2 + 8/45}.  (2.10) 

If  r^2  = 0,  (2.10)  is  the  two  variable  analog  of  (2.6)  (with  d = 0) . 

% 

The  value  of  c that  minimizes  (2.10)  when  r^2  = 0 and  V = B 

% 

is  0.61  and  a = 3.892.  With  these  values  of  c and  a,  V = B = 2.788. 

k 

As  a function  of  r 2»  the  values  of  c that  minimize  the  integrated 

p 

variance  (2.10)  when  B = 2.788  are  c2  = 0.68(r12  = .90),  0.73(r12  = 

p 

.95),  0.89(r^2  = 0.99),  and  1.23(r12  = .999).  In  addition,  if  c 2 = 

0.61  is  used  to  construct  an  experimental  design  for  this  example 

but  r^2  / 0,  the  integrated  squared  bias,  B,  remains  constant  (since 

B is  not  a function  of  r^2)  i-nte9rated  variance  becomes 

V = 10.409 (r12  = .90),  19. 336(ri2  = .95),  90.839(r12  = .99),  and 

3< 

895.342(r^2  = .999).  Finally,  if  B = 2.788,  the  values  of  c needed 
to  insure  that  V = B are  c = 0.61 (r^2  = 0),  1.40(r12  = .90),  1.96 

(ri2  = *95)'  4’33(ri2  = ,99)'  and  13-66<r12  = -9")- 

Thus  in  selecting  an  experimental  design  for  fitting  (2.8)  when 

(2.7)  is  the  correct  model,  the  integrated  variance  cannot  be  ignored 
if  r^2  is  close  to  1.0.  Values  of  r^2  near  1.0  frequently  arise  with 
regression  data  and  can  result  in  an  extremely  large  integrated  vari- 
ance for  the  least  squares  prediction  equation,  even  if  the  model  is 
correctly  specified.  Thus,  although  biased  regression  estimators 
contribute  nonzero  integrated  squared  biases  to  J,  the  reduction  in 
integrated  variance  over  the  least  squares  estimator  can  result  in  an 
overall  reduction  in  integrated  mean  squared  error  with  multicollinear 
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data.  The  magnitude  of  a reduction  in  integrated  mean  squared  error 
over  least  squares,  if  any,  will  depend  on  the  region  of  interest 
and  the  weight  function  used  in  (2.3),  as  well  as  on  the  particular 
biased  estimator  employed. 

3 . INTEGRATED  VARIANCE 


The  integrated  mean  squared  error  (2.3)  of  a prediction  equa- 
tion can  be  partitioned  into  two  components:  integrated  variance, 

V,  and  integrated  squared  bias,  B,  where 


and 


V = /.../  Var{Y (u) } W (u) du 


B = /.../  {E  [Y  (u)  ] - ElY(u)]}2  W (u)  du. 
R 


(3.1) 


(3.2) 


The  three  regression  estimators  to  be  compared  in  this  article  are 
the  ordinary  least  squares  (LS) , simple  ridge  regression  (RR) , and 
principal  component  (PC)  estimators  defined  in  equations  (3.4)  - (3.6) . 
To  facilitate  the  evaluation  of  the  associated  prediction  equations, 
define  the  latent  roots  of  X'X  by  L < l < ...  < H and  the  corre- 

1 - 2 - — p 

sponding  orthonormal  latent  vectors  by  V . V.,  ...  , V . It  is  well- 

-1  -2  -p 

known,  then,  that 


P -i  P -i 

X'X  = l l.V.V'.  and  (X'X)  = E SL . V.V’.  . 

j=l  j=l  3 

Using  (3.3),  the  LS  estimator  of  B can  be  written  as 


(3.3) 


“ -1  v -1 
B = (X’X)  X'Y  = E i.  C.V., 
“LS  - j=1  3 3-D 


(3.4) 


where  C.  = V'.X'Y.  The  PC  estimator  we  will  examine  deletes  terms 

3 “I  - 

from  (3.4)  corresponding  to  multicollinearities  among  the  predictor 
variables  as  indicated  by  the  presence  of  small  latent  roots  of  X'X 
(see  Marquardt  [18],  Mansfield  [17],  and  Gunst  and  Mason  [8]  for  a 
more  complete  discussion  of  this  estimator,  including  justification 
for  deleting  terms  solely  on  the  basis  of  the  magnitudes  of  the  latent 
roots  of  X'X).  If  the  terms  corresponding  to  r small  latent  roots 
are  deleted  from  (3.4)  the  resulting  PC  estimator  of  3 is 


fL„  = Z s,71c.v.. 

-pc  j=r+l  3 3-: 


Finally,  for  k > 0 the  RR  estimator  (Hoerl  and  Kennard  [12])  is  given 
by 


*■  -1  ^ -1 
B = (X'X  + kl)  X'Y  = Z (l.  + k)  C.V.. 

-RR  - j=1  1 1-3 


In  deriving  the  integrated  mean  squared  error  of  the  RR  estimator  we 

assume  that  k is  a fixed  constant,  i.e.  nonstochastic.  Although  in 

practice  k is  typically  selected  according  to  a stochastic  method 

* "2 

(e.g.  Ridge  Trace,  k estimated  using  B and  a , etc.) , nonrandom 
selection  rules  utilizing  only  characteristics  of  X are  conceivable. 
One  such  procedure  is  introduced  in  Section  6. 

Assuming  that  (2.1)  is  the  correct  model,  the  integrated  vari- 
ances of  the  above  three  estimators  are,  respectively, 


: P -i  , 

(3.7) 

z i.  v!  i v. 
j=i  3 “3  ^ 

p -i  * 

Z & . V!  t V. 

j-r+1  3 "3  3 

(3.8) 

1 


1 


and 


12 


-1  2 


V = n ±a^  + tr{(X'X  + kl)  1 X'X(X'X  + kl)_1|} 


,-lj 


RR 


= n 1a2  + a2  Z l . ( £ . + k)  2 V'.  i V . , 
j=l  -=> 


(3.9) 


- “1 

where  (X'X)  = Z i.  V.V1.  and  t is  the  second  order  moment  matrix 

j=r+l  3 

of  the  weight  function  in  the  region  R: 


| = /. . ./  uu'  W(u)du. 
R 


(3.10) 


Note  immediately  that  V < V for  r > 1 and  V < V 

PC  LS  - RR  LS  for 

k > 0;  i.e.  both  PC  and  RR  result  in  reductions  in  integrated  vari- 
ance over  LS.  With  multicollinear  data  these  reductions  can  be  quite 
large  since  the  prediction  equations  using  the  biased  estimators 
either  eliminate  (PC)  or  dampen  (RR)  the  first  few  terms  of  (3.7) 
that  contain  the  largest  values  of  Si  \ The  magnitude  of  Vp(_,  rela- 
tive to  V depends  on  the  number  of  terms,  r,  deleted  by  the  PC 
RR 

estimator  of  3 and  the  value  of  k selected  for  the  RR  estimator. 

One  of  the  simplest  forms  of  $ in  (3.10)  occurs  when  R is  a sym- 
metric region  about  u = 0 in  each  u^  and  the  following  two  conditions 
hold  for  the  weight  function  W(u): 


(i)  W(u)  is  normalized  so  that  /.../  W(u)du  = 1 and  is  an 

R 

even  function  of  each  u.;  and 

3 

2 

(ii)  u,  W(u)du  = x,  a constant,  for  j = 1,2,  ...  , p. 

R 3 - - 
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For  example,  many  uniform,  triangular,  and  exponential  weight  func- 
tions satisfy  these  requirements.  When  these  conditions  are  valid 


t = Tl» 


(3.11) 


and  the  integrated  variances  become,  respectively, 


-12  2-1 
V = n a + q x Z £. 

LS  j-1  3 


(3.12) 


-12  2 ^ -1 
V = n a + ax  I i. 
PC  . , , 3 

3=r+l 


(3.13) 


and 


-12  2 * -2 
V = n a + ax  Z £.(£.+  k) 

RR  ..33 

3=1  J J 


(3.14) 


Now  to  further  illuminate  the  tradeoffs  in  integrated  variance 
among  these  three  prediction  equations,  we  consider  the  case  of  p = 2 
predictor  variables  and  ^ defined  as  in  (3.11).  If  r again  denotes 
the  "correlation"  between  the  n observations  on  the  two  predictor 
variables,  then  = 1 - r^  and  Si^  = 1 + r^  (assuming  w.l.o.g. 
that  r > 0) . If  the  PC  estimator  deletes  the  term  corresponding 
to  the  smallest  latent  root,  £ , then 


V(s  - 2<1  - h2r1' 


v?c  “ 11  * V1' 


(3.15) 

(3.16) 


and 

■ (1  - ri2)U  - r!2  + k)'2  + (1  + r12M1  + ri2  * k)'2'  <3-17> 

-12  2 

where,  for  each  predictor,  V*  = (V  - n a )/a  x.  Figure  1 contains 


graphs  of  V*  , V*  , and  V*  (for  several  values  of  k)  as  a function 
LS  PC  RR 
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Of  r . 


[Insert  Figure  1] 


The  comparisons  among  the  predictors  that  are  evident  from 

Figure  1 include  the  following: 

(i)  LS  has  uniformly  larger  integrated  variance  than  PC  and 

RR,  with  V*  asymptotically  unbounded  as  r + 1; 

LS  12 

(ii)  for  k approximately  0.4  or  less  the  integrated  variance 
of  RR  is  larger  than  that  of  PC  except  for  values  of 


r^2  extremely  close  to  1.0. 

Thus  for  p = 2 predictor  variables  and  $ = Tl,  both  biased  estimators 
greatly  reduce  the  integrated  variance  over  that  obtainable  by  LS 
except  for  RR  when  k is  small  and,  simultaneously,  r12  is  not  close 
to  1.0;  i.e.  except  when  k is  small  and  the  two  predictor  variables 
are  not  severly  multicollinear . Comparing  PC  with  RR  reveals  that 


V*  is  generally  smaller  than  V*  unless  k is  relatively  large  or 
PC  RR 

r^^  is  extremely  close  to  1.0. 

Another  comparison  between  V and  V for  p = 2 is  presented 

PC  RR 

in  Figure  2,  which  displays  the  regions  for  which  V < V and 

PC  RR 

V > V as  a function  of  r . Again  this  graph  shows  that  for 
PC  KK  1 2 

small  k V < V „ unless  r _ is  large  and  that  V_„  > V „ for  large 

PC  - RR  12  PC  RR  ^ 

k over  a wide  range  of  r 2» 
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4.  INTEGRATED  SQUARED  BIAS 


The  integrated  squared  biases  of  the  prediction  equation  (2.2) 
using  coefficient  estimators  (3.4)  - (3.6)  are,  respectively, 


and 


LS 


0, 


PC 


tr{V  V'BB'V  V'i} 
r r — r rT 


= B’V  V'iv  V'B, 
r rT  r r- 

B = k2tr{v(L  + kI)-1V' BB'V(L  + kI)_1V'|:} 
RK  ” 

= k2B'V(L  + kI)-1V'$V(L  + kl) _1V' B, 


(4.1) 


(4.2) 


(4.3) 


where  V = [V  , V , ...  , V ] and  L = diagU  , £ , ...  , 5,  ) . Al- 
r —I  — 2.  — r ±2  p 

though  the  LS  estimator  contributes  no  bias  to  the  integrated  mean 
squared  error,  the  bias  contribution  of  PC  and  RR  to  their  respective 
integrated  mean  squared  errors  can  be  small  enough  to  net  great  re- 
ductions over  J due  to  the  large  reductions  in  integrated  vari- 
LS 

ance.  This  is  especially  true  for  severely  multicollinear  data  and 
weight  functions  W(u)  that  give  smallest  weights  to  regions  that  have 
the  same  multicollinearities  as  those  in  the  matrix  of  predictor 
variables,  X. 

If  we  again  examine  the  characteristics  of  regions  and  weight 
functions  yielding  ^ = Tl,  (4.2)  and  (4.3)  reduce  to 

r 2 

B = T Z (V! B)  (4.4) 

j-1  3" 

B = xk2  Z U.  + k)~2  (V!  B)  2 . 

RR  3 -j~ 


and 


(4.5) 
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It  is  readily  apparent  from  (4.4)  and  (4.5)  that  B < B when 

PC  RR 

D <0  and  B > B when  D >0,  where 
p - PC  RR  p 


D = E (1  - a.)Z.  - E aTzT, 
P j=l  3 3 j=r+l  3 3 


(4.6) 


Zj  = and  a_.  = k/(i^  + k) . The  results  of  the  previous  section 

suggest  that  large  values  of  k yield  an  integrated  variance  for  ridge 
regression  that  is  smaller  than  the  integrated  variance  for  principal 
components.  But  there  is  a tradeoff  in  integrated  squared  bias  since 
large  values  of  k imply  that  a^  £ 1 for  latent  roots  defining  multi- 
collinearities  (i.e.  for  j = 1,  2,  . . . , r)  and  hence  that 


D ^ - E a .Z . < 0. 
p * j-r+1  3 3 


This  property  is  of  course  weakened  if  some  of  the  Z ^ , j = 1,  2,  ...,  r, 
are  large  enough  to  offset  the  closeness  of  the  corresponding  a^  to  1. 


For  p = 2 predictor  variables  and  )[  = tI, 


bpc  " T(ei  - V /2 


BRR  = xk2< (B1-e2)2(l-r12+k)"2  + (61+62)2(l+r12+k)“2}/2,  (4.8) 


again  assuming  r = 1 and  r > 0.  As  a function  of  a , a , Z , and 
Z2»  these  expressions  are 


BpC  = TZ2  and  Brr  = x{a2z2  + a2Z2}. 
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Figure  3 depicts  regions  of  the  (Z  , Z ) - plane  for  which 
BPC  < brr  for  p = 2 predictor  variables,  ^ = tI,  r = 0.95,  and 
four  choices  of  k.  Similar  regions  occur  for  other  values  of  r^, 
only  the  slopes  of  the  lines  change . The  locus  of  points  for  which 


Bpc  = is  two  lines  passing  through  the  origin  with  slopes 

+ a”1  (1  - a^)*2. 


[Insert  Figure  3] 

Examination  of  a prediction  equation  with  p = 3 predictor 
variables  reveals  some  general  characteristics  of  integrated  bias 
comparisons  between  PC  and  RR.  In  this  case  (recall  equation  (4.6)) 


D 


3 


(1  - 


2,  2 

ai)Zl  * 


2 2 

a Z - 
2 2 


2 2 
3 3 


r = 1 


(4.10) 


(1_a2)Zi+  (1  - a2)Z2  - a^  r=2 


The  general  shape  of  the  region  defined  by  (4.10)  when  FC  deletes 

r = 1 terms  from  (3.4)  to  obtain  (3.5)  is  that  of  an  elliptical 

cone  centered  on  the  Z^  axis.  Outside  this  cone  BRC  < Brr,  while 

2 

inside  it  B > B ; i.e.,  B < B unless  Z is  sufficiently  large. 
PC  RR  PC  RR  1 

This  comparison  generalizes  to  an  arbitrary  number  of  predictor 
variables,  p,  for  r = 1. 

When  p = 3 and  r = 2,  indicates  that  the  general  shape  of 

the  region  comparing  B and  B is  again  characterized  by  an  ellip- 

PC  RR 

tical  cone,  now  centered  on  the  Z_  axis.  Inside  this  cone  B„„  < B , 

3 PC  RR 

2 

while  outside  it  B > B . So  B < B for  arbitrary  Z,  if  neither 
PC  RR  PC  RR  J 

2 2 . 

Z^  nor  Z2  is  too  large.  This  conclusion  remains  valid  for  arbitrary 


p provided  r = p - 1 . 
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5.  INTEGRATED  MEAN  SQUARED  ERROR 

The  tradeoffs  in  integrated  variance  and  integrated  squared 
bias  that  were  uncovered  in  the  previous  two  sections  can  be  eval- 
uated by  considering  the  integrated  mean  squared  errors  of  the  pre- 
diction equations: 


J = n-1a2  + a2  tr{  (X'X)"1}:},  (5.1) 

LS 

J,  = n_1a2  + a2  tr{(X'X)~$}  + g'V  V>fv  V'g,  (5.2) 

PC  - r r r r- 

J = n_1a2  + a2  tr{(X'X  + kI)~1x,X(X’X  + kl)"1^} 

RR 

O _ 1 _ -1 

+ k 6'V(L  + kl)  V'$V(L  + kl)  V'B.  (5.3) 

Rather  than  attempting  a complicated  comparison  of  expressions 
(5.1)  - (5.3),  we  again  simplify  the  problem  by  specifying  that 
$ = xl.  Then, 


2-1  F -1 
„ = a (n  + x E . } , 
LS  j=1  3 


2-1  P _i  r 2 

J = a {n  +x  E i.  } + x E Z,, 

PC  j=r+l  3 j=l  3 


and  J = cj2{n  ^ + t E £,.(£,.  + k)  2)  + xk2  E (i.  + k)  Z.. 
**  j=l  3 3 j-1  3 3 


Examination  of  (5.4)  and  (5.5)  reveals  that  J < J when  E < 0 

LiO  ~ RC  P 

and  J „ > J _ when  E >0,  where 
LS  PC  p 
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Comparing  (5.4)  with  (5.6)  reveals  that  J < J when  F <0  and 

LS  - RR  p - 

J > J when  F >0,  where 
LS  RR  p 

p 2 2 

F = Z (b . - a Z.) , (5.8) 

P j-1  3 3 3 

= ko2(2X,j  + k) /[£  (i,  + k)2]. 


and  a_.  = k/(Jd  + k)  as  in  Section  4.  Finally,  from  (5.5)  and  (5.6), 

J < J when  D < d and  J > J when  D > d , where  D is  de- 

PC  -RR  P~P  PC  RR  PP  P 

fined  in  (4.6)  and 


d = (V 
P RR 


V)  /t 
PC 


P 

j=r+l  ' 


a2  Z Si.  (SI.  + k)-2. 

J-1  3 3 


(5.9) 


For  the  two  variable  prediction  equation  considered  in  the 
previous  sections, 

J£s  ' 202(1  - ri2)-1'  <5-101 

JPC  = °2(1  + ri2)_1  + Zl'  (5.11) 

and  JRR  = (1-ri2^1-ri2+lc*  2 + ^1+ri2^1+ri2+k* 

+ k2{(l-r12+k)“2Z2  + (l+ri2+k)"2Z2},  (5.12) 

-1  2 

where  J*  = (J  - n a )/t.  Figures  4 and  5 pictorially  reveal  the 
combined  effects  of  integrated  variance  and  integrated  squared  bias 


of  the  two  variable  predictors  by  plotting  (5.10)  - (5.12)  as  a 
2 2 2 

function  of  Z^,  for  a =1  (hence  Z_.  measures  the  magnitude  of 
2 2 

(V'.B)  relative  to  a ) . Two  values  of  r are  used  to  indicate  the 
~ 1 12 

changes  in  the  curves  as  r^2  is  changed,  and  only  the  curves  for 

2 2 
Z2  = 0 are  depicted.  Nonzero  values  of  Z2  increase  the  intercept 

values  for  J*  but  leave  the  curves  for  J*  and  J*  unchanged.  So 
RR  LS  PC 

this  is  a "worst  case”  comparison  of  J*  and  J*  with  J*  • 

LS  PC  RR 


[Insert  Figures  4 and  5] 


In  general,  these  figures  support  the  contention  that  reductions 

in  integrated  mean  squared  error  over  LS  are  possible  with  either 

2 2 
biased  estimator  provided  that  Z^  is  not  too  large  relative  to  a 

2 

(and  to  a lesser  extent,  provided  that  Z2  is  not  too  large  for  RR) . 

Due  to  the  magnitude  of  V for  the  stronger  multicollinearity , 

Lb 

r^2  = 0.99,  substantial  reductions  in  integrated  mean  squared  error 

are  seen  to  be  possible  with  the  biased  estimators  when  the  predictor 

variables  are  extremely  multicollinear . The  comparison  of  J*c  and 

2 

J*  indicates  that  J*  < J*  for  smaller  values  of  z f particularly 
RR  PC  — RR  1 

2 

for  small  values  of  k.  Large  values  of  Z^  or  large  selections  of 

k result  in  smaller  integrated  mean  squared  error  for  RR  than  PC, 

2 

provided  a large  value  of  Z2  doesn't  compensate  for  these  reductions. 


6.  ESTIMATION 


Helms  [10]  and  Park  [22]  employ  integrated  mean  squared  error 
criteria  to  assess  different  least  squares  prediction  equations  that 


arise  due  to  attempts  to  select  acceptable  subsets  of  the  original  p 


t 
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predictor  variables  for  use  in  a final  model.  Helms  [10]  uses  known 
characteristics  of  the  predictor  variables  to  define  $ and  then 
estimates  the  integrated  variances  of  the  subset  models  as  a (biased) 
mimic  of  the  corresponding  integrated  mean  squared  errors.  Park 
[22],  analyzing  a p = 3 variable  model,  determines  $ by  defining 
W(u)  to  be  a uniform  weight  function  and  R to  be  the  unit  cube. 

He  then  estimates  integrated  mean  squared  error  by  evaluating  J 
for  various  subsets  of  the  full  set  of  predictor  variables  using 
the  least  squares  estimates  of  the  parameters  from  the  full  model. 
Both  of  these  procedures  for  estimating  integrated  mean  squared 
error  yield  biased  estimators  of  J. 

In  this  section  we  will  develop  an  alternate  approach  for  esti- 
mating integrated  mean  squared  error.  We  will  use  characteristics 
of  the  data  and  each  of  the  three  regression  estimators  discussed 
in  this  paper  to  define  a region  of  prediction  R and  an  estimator 

A A 

J so  that  J is  an  unbiased  estimator  of  the  corresponding  J.  An 
example  illustrating  some  of  the  characteristics  of  this  estimation 
scheme  concludes  the  section. 

Since  we  are  primarily  concerned  with  multicollinear  data,  a 
transformation  to  the  principal  axes  of  X'X  rather  than  using  the 
original  coordinate  system  allows  R to  be  defined  to  reflect  anom- 
alies in  the  data.  For  example,  if  X is  severely  multicollinear 
there  is  very  little  information  in  the  p dimensional  space  of  the 
predictor  variables  in  directions  defined  by  the  latent  vectors 
corresponding  to  the  small  latent  roots  of  X'X.  The  region  of 
prediction,  R,  should  be  chosen  to  reflect  such  characteristics 


in  the  data;  otherwise  one  is  in  danger  of  extrapolation  with  the 
predictor.  So  let 
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t = V'u, 


where  u is  again  a vector  of  standardized  values  of  the  p predictor 
variables  in  the  original  coordinate  system,  and  t represents  this 
same  point  in  the  orthogonal  coordinate  system  defined  by  the  p 

latent  vectors  of  X'X;  i.e.,  t = u j = 1,  2,  p.  Let  W*(t) 

and  R*  denote  weight  functions  and  regions  of  the  predictor  variables , 
respectively,  in  the  transformed  coordinate  system. 

Consider  now  the  use  of  a rectangular  region  of  interest  in  the 
transformed  space  which  is  defined  by 


R*  “ {t  : -8^  < tj  < Sj , j = 1 , 2,  ...,  p}, 


where  s_.  > 0 will  be  defined  for  each  estimator  and  represents  the 
limits  imposed  on  the  use  of  a prediction  equation  in  each  direction 
of  the  transformed  space  of  predictor  variables.  For  illustrative 
purposes,  a uniform  weight  function,  W*(t),  will  be  used  to  discuss 
the  estimation  of  J although,  as  mentioned  in  Section  3,  many  other 
weight  functions  behave  similarly.  Accordingly,  define 


3 Ii  (2s.)  teR* 


W*(t)  = 


t^R* . 
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From  (6.2)  and  (6.3), 


^ = /.../  tt'W*(t)dt  = diag(s  , s , ...,  s ).  (6.4) 

R*  p 


Noting  that  in  the  transformed  space  (X'X) 


i ) , we  find  from  (5.1)  that 
P 


-12  2 F -1 
JT  _ = n a + a £ £ . s . , 

“ j-i  3 3 


diagU”1,  J^1, 


and,  hence,  that  an  unbiased  estimator  of  (6.5),  regardless  of  the 
choice  of  the  s_.  (provided  they  are  not  random  variables)  , is 

~ -1 ~ 2 ' 2 P -1 

J = n a + a £ i.  s.,  (6.6) 

LS  j=l  3 3 

*2  2 
where  a = MSE  is  the  usual  unbiased  estimator  of  o . 

Similarly,  in  the  transformed  space  (X'X)  = diag(0,  0,  ...,  0, 

£ , ...,  I 'S  and  (5.2)  becomes 

r+i  p 


-12  2 p -1  r 2 

J_  = n o + a £ l,  s . + £ (V’.B)  s.. 

PC  j=r+l  3 3 j=l  -3-  => 


-1  2 _i' 2 " 2 

We  again  estimate  n a by  n a , but  rather  than  use  a in  the 

second  term  of  (6.7),  we  will  use  MSE  , defined  as 

PC 


MSE  = {Y*  (I  - n_  11’  - X(X’X)  X' ) Y}/ (n-p-l+r) 

PC  - — 

A2  r -12 

= {(n-p-l)o  + £ l . C. }/ (n-p-l+r) , 


i.e.  let 


3=1  3 3 


* -1~2  F -1 
J = n a + MSE  £ H.  s.. 
PC  PCj=r+1  3 3 


The  bias  of  J is 
PC 


p -1  r ? 

E[JPC]  " JPC  = { Z Vs^}{  E /(n-p-l+r)} 


j-r+1  3 3 j=l  3 


- Z (V  3)  s . 


j-1  3 


If  we  now  restrict  the  region  of  prediction  R*  so  that 


(n-p-l+r)  j = 1,  2,  . . . , r 


(p-r)  j = r+1,  r+2 , p 


then  E[J^1  = Jpc*  Note  that  (6.9)  does  restrict  prediction  most 


in  directions  for  which  there  is  little  information  on  the  predictor 


variables  (i.e.  directions  defined  by  latent  vectors  corresponding 


to  small  latent  roots)  and  least  in  directions  for  which  there  is 


the  most  information. 


The  integrated  mean  squared  error  of  the  RR  estimator  using 


(6.4)  is  given  by 


= n 1a2+a  Z ZAZ.  +k)  2s.+k2  £ (fc.+k)  2 (v B)2s.,  (6.10) 

RR  j.l  3 3 1 j=1  3 -I"  J 


for  nonstochastic  choices  of  k.  Define  (note  that  this  is  not  the 


usual  mean  squared  error  definition  of  the  ridge  estimator) 


-1  -1  p -1 
MSE  = {Y ' (I-n  ll'-x(x'x+kl)  X ' ) Y}/  (n-1-  Z ZAZ.+k)  ) 
RR  - - - - j=1  3 3 


"2.,  p -1,.  ..  -1  2 


= { (n-p-1) a +k  Z Z.  (Z.+k)  c.}/(n-l-  Z ZAZ.+k)  ). 


j-1  3 3 


j-1  3 3 


ft 


i 


* 

s 


» 


> 


This  is  an  unbiased  estimator  of  J if  we  restrict  R*  so  that 

RR 


-l  F -i 

s.  = k i.u.  + k)/(n  - 1 - I 1.(1.  + k)  ), 
3 33  j=i  3 3 


and  also  use 


k = p/ ( n - 1 ) . 


(6.13) 


It  is  especially  important  to  observe  that  in  obtaining  an  unbiased 

estimator  of  J , a nonstochastic  rule  for  selecting  the  ridge 
RR 

shrinkage  parameter  k resulted. 

To  illustrate  the  use  of  these  estimators  we  will  examine  the 

nine-variable  data  analyzed  by  Webster,  Gunst,  and  Mason  [23] . A 

least  squares  backward  elimination  of  this  data  resulted  in  a final 

predictor  involving  four  of  the  nine  predictor  variables:  X^,  X^ , 

X&,  and  Xg,  with  X^  and  X^  having  a large  pairwise  multicollinearity 

(r  ^ = 0.978) . Using  a latent  root  regression  backward  elimination 

procedure  (Webster,  Gunst,  and  Mason  [23,  24]),  the  final  predictor 

contains  only  two  predictor  variables,  X and  X , which  do  not  appear 

6 9 

strongly  multicollinear  (r  = 0.143) . Both  these  subset  predictors 

69 

appear  to  be  reasonably  adequate  predictors  of  the  n = 15  observed 
responses  (coefficients  of  determination  for  the  two  models  are 
0.80  and  0.75,  respectively,  and  the  residual  mean  squared  errors 
are  2.40  and  2.49,  while  the  corresponding  statistics  for  the  full 


model  are  0.83  and  4.12)  . 
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Tables  1 and  2 contain  the  statistics  used  in  the  evaluation 

of  the  integrated  mean  squared  errors  of  the  prediction  equations 

for  each  of  the  above  subset  models.  The  s.  values  for  RR  and  PC 

3 

were  obtained  from  (6.12)  and  (6.9),  respectively,  while  those  for 

LS  were  chosen  to  be  the  smaller  (in  absolute  value)  of  the  upper 

and  lower  bounds  on  the  observed  t.  = V'.u.,  i = 1,  2,  ...,  n (observe 

3 -l-i 

that  these  values  are  merely  the  observed  principal  components  of 
X)  . This  choice  of  s^  for  LS  insures  that  we  do  not  attempt  to 
extrapolate  along  the  axes  in  the  transformed  space. 


TABLE  1.  STATISTICS  FOR  THE  TWO  VARIABLE  MODEL:  , X„ . 

6 9 


Values  of  s . 


Range  on  t.. 


3 l. 

RR 

PC 

LS 

LOWER 

UPPER 

1 .8567 

.4892 

.0659 

. 3521 

-.4240 

.3521 

2 1.1433 

.8399 

1.1433 

.3927 

-.3927 

.6324 

J = 2.0406 

LS 

JPC 

= 7.8419 

JRR 

= 3.5931 

(k=0 .1429) 

TABLE  2.  STATISTICS  FOR 

THE  FOUR  VARIABLE  MODEL:  X1 

'X4'X6'X8' 

Values  of 

s . 
1 

Range  on  t . 

3 

3 S 

RR 

PC 

LS 

LOWER 

UPPER 

3 

1 .0115 

.0010 

.0010 

.04  39 

-.0708 

.0439 

2 .2355 

.0362 

.0785 

.1718 

-.2771 

.1718 

3 .8574 

.2895 

.2858 

.2715 

-.4816 

.2715 

4 2.8956 

2.7210 

.9653 

.8030 

-.8030 

.8454 

J _ = 12.5227 
LS 

JPC 

= 3.1877 

JRR 

= 4.1786 

(k=0 . 2857) 
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Consider  first  the  two  variable  model.  The  region  of  prediction 
for  PC  is  greatly  distorted  when  comparison  is  made  with  the  range 
on  t_.  for  each  of  the  two  dimensions.  The  region  of  prediction 
for  RR  allows  a great  deal  of  extrapolation  in  the  second  dimension. 
The  estimates  of  J indicate  that  the  LS  prediction  equation  should 
be  preferred  over  PC  and  RR,  with  the  PC  predictor  clearly  inferior 
to  the  other  two.  These  results  are  not  especially  surprising  since 
there  is  no  strong  multicollinearity  in  this  model. 

The  statistics  presented  in  Table  2 point  out  the  advantages 
of  using  either  biased  estimator  of  B in  a prediction  equation  when 
the  data  is  multicollinear . The  regions  of  prediction  for  PC  and 
RR  are  conservative  in  the  first  two  dimensions  when  compared  with 
the  range  on  the  t_..  The  range  of  prediction  for  RR  in  the  last 
dimension  is  quite  anticonservative.  The  estimated  integrated  mean 
squared  errors  of  PC  and  RR  indicate  that  if  prediction  is  confined 
to  the  regions  in  Table  2,  both  biased  predictors  greatly  reduce 
the  overall  variability  of  the  LS  predictor. 


7.  SUMMARY  AND  RECOMMENDATIONS 


The  example  in  the  previous  section  suggests  several  questions 
which  need  to  be  answered  before  a comparison  of  LS , PC,  and  RR 
prediction  equations  can  be  regarded  as  conclusive.  First,  the 
restriction  of  an  unbiased  estimator  of  J may  be  detrimental  to 
the  evaluation  of  the  ridge  prediction  equation.  The  values  of 


s_.  for  the  largest  dimension  in  each  subset  model  was  much  larger 
than  the  bounds  of  the  data  in  these  directions.  Yet  a smaller 


* 
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value  of  s.  renders  J biased  for  J . Further  investigations  on 

3 RK  RR 

the  desirability  of  unbiased  estimators  of  J are  needed  to  resolve 
this  problem. 

In  addition  to  the  estimation  problems,  additional  choices  of 
weight  functions  should  be  considered.  For  example,  letting  $ = X'X 
deserves  consideration,  as  Helms  [9]  argues.  Finally,  how  should 
comparisons  be  made  if  the  model  is  misspecified?  These  problems 
are  currently  under  investigation  and  additional  results  will  be 
reported  in  the  near  future. 

The  unresolved  questions  just  raised  do  not  detract  from  the 
main  thrust  of  this  paper:  integrated  mean  squared  error  is  a 

flexible  tool  for  evaluating  competing  prediction  equations.  The 
general  formulation  of  applying  this  criterion  to  least  squares, 
principal  component,  and  ridge  regression  prediction  equations  has 
been  presented  for  a correctly  specified  model,  and  comparisons  have 
been  made  among  the  predictions  for  a general  class  of  weight  func- 
tions over  a specific  region  of  interest.  By  allowing  the  regions 
of  prediction  to  vary,  unbiased  estimators  of  the  integrated  mean 
squared  error  of  the  three  prediction  equations  were  obtained  and 
a numerical  example  illustrating  the  use  of  the  procedures  was 
discussed . 
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